ExecSolib strategy: make ddtrace.so directly executable#3711
ExecSolib strategy: make ddtrace.so directly executable#3711cataphract wants to merge 33 commits intomasterfrom
Conversation
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: eece5c4 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3711 +/- ##
=======================================
Coverage 68.78% 68.78%
=======================================
Files 166 166
Lines 19015 19015
Branches 1792 1792
=======================================
Hits 13079 13079
Misses 5124 5124
Partials 812 812
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
67f74d6 to
83c3f82
Compare
Introduce the ExecSolib spawn strategy by embedding an ELF entry point (_dd_solib_start) into ddtrace.so itself.
But the import can't be declared hidden. In the end the symbol will be in the got but linker to emits a RELATIVE reloc (not GLOB_DAT) -- so should work with our self-relocation.
b3ce66c to
9720a80
Compare
The x86-64 inline asm restoring the kernel stack and jumping to ld.so:
"mov %[sp], %%rsp\n"
"xor %%edx, %%edx\n" // required: rdx = 0 for ld.so startup ABI
"jmpq *%[entry]\n"
GCC at -O0 allocated %[entry] (ldso_entry) to rdx, causing the xor to
zero the jump target before the jmpq executed → SIGSEGV at address 0x0
on every x86-64 ExecSolib launch.
The fix is to pin ldso_entry to rax via the "a" constraint. Using the
"rdx" clobber alone is not sufficient: GCC is permitted to allocate
input operands into clobbered registers because inputs are consumed
before the asm fires. A specific register constraint ("a" = rax) is
the correct and optimization-safe solution.
With the fix, GCC emits:
mov %rcx, %rsp ; stack_top in rcx (or any non-rax "r")
xor %edx, %edx ; zero rdx (harmless: entry is in rax)
jmpq *%rax ; jump to ldso_entry
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
0f21818 to
82345e8
Compare
Benchmarks [ tracer ]Benchmark execution time: 2026-04-10 17:06:18 Comparing candidate commit eece5c4 in PR branch Found 0 performance improvements and 2 performance regressions! Performance is the same for 190 metrics, 2 unstable metrics. scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching3
|
76bb66d to
4e0b02e
Compare
4e0b02e to
5d0650b
Compare
a485ef4 to
b6a70af
Compare
4cffacf to
9293858
Compare
6e1977d to
09acb37
Compare
09acb37 to
658c99a
Compare
…n detection Update libdatadog submodule to fix container ID extraction when running under Podman with cgroupns=host. The container cgroup path includes a /container subdirectory after the .scope suffix (e.g. 0::/machine.slice/libpod-HEXID.scope/container), which the previous regex did not handle. This caused origin detection to fail: no entity ID was sent to the agent, so container tags were missing from APM traces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The trampoline binary embedded in ddtrace.so was produced as ET_EXEC (non-PIE) by toolchains that don't default to -fPIE (e.g. devtoolset-7 on CentOS 7). elf_load_trampoline accepted only ET_DYN and used mmap(NULL) to pick a random load base — an ET_EXEC binary loaded that way crashes because its absolute virtual addresses no longer match. Two-pronged fix: 1. libdatadog/spawn_worker/build.rs: add -fPIE/-pie on Linux so the trampoline is always ET_DYN, matching the original design intent. 2. solib_bootstrap.c: add a __builtin_trap() guard after the ET_DYN check so a mis-built ET_EXEC trampoline aborts loudly instead of silently misbehaving. Fixes "failed to map trampoline" (exit 121) on bookworm-slim for PHP 8.3-8.5. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1f01b1d to
59abc04
Compare
…ssion Update libdatadog submodule to include the fix for pecl-installed extensions: access(X_OK) check before ExecSolib execve, falling back to FdExec (fexecve via trampoline) when the +x bit is absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to RUST_FILES dlopen()/dlsym() crash when called from a shared library that is exec'd as the main program via ld.so: glibc's __libc_start_main never runs, so the internal dynamic-linker state is uninitialised. Replace the dlsym(RTLD_DEFAULT, symbol) lookup with direct extern weak references to ddog_daemon_entry_point / ddog_crashtracker_entry_point. Since ssi_entry.c is compiled into libddtrace_php.so, both symbols live in the same binary and are resolved at link time — no runtime dl machinery needed. Also correct the argv-layout comment: ld.so strips its own path before calling the entry point, so the program sees argv[0]=lib_path (not ld_path). Separately, add libdd-shared-runtime to the RUST_FILES brace-expansion in the Makefile so that pecl/loader tests rebuild when that crate changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…i_entry.c Three bugs in the ExecSolib SSI sidecar startup path, found by empirical testing with minimal probe programs on ubuntu-vm: 1. Stack misalignment (root cause of the original crash) At process entry the kernel sets rsp % 16 == 0. x86-64 SysV ABI requires rsp % 16 == 8 at C function entry (as if 'call' pushed a return address). Jumping to ssi_main without adjustment meant any SSE instruction requiring 16-byte alignment (e.g. 'movaps' inside pthread_mutex_lock, which dlsym calls) would SIGSEGV. Fix: add 'and $-16, %rsp; sub $8, %rsp' before 'jmp ssi_main'. 2. .init_array not called ld.so's _dl_init skips .init_array for the main executable (l_name="" && l_type=lt_executable — confirmed in glibc dl-init.c:46-48). It expects __libc_start_main to handle it, but we never call that. Rust runtime initialisation (allocator, TLS, panic hooks, ...) lives in .init_array. Without it, calling a Rust entry point would crash. Fix: run_own_init_array() walks _DYNAMIC to find DT_INIT/DT_INIT_ARRAY and calls them before entering Rust. Load base comes from __ehdr_start (linker symbol at VMA 0 of the DSO). 3. dlsym replaced with direct extern weak references dlsym would work with a properly aligned stack, but is unnecessary since ssi_entry.c is compiled into the same libddtrace_php.so as ddog_daemon_entry_point / ddog_crashtracker_entry_point. Direct extern weak references are resolved at link time — simpler and with no runtime overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both ddog_daemon_entry_point and ddog_crashtracker_entry_point are always present in libddtrace_php.so; weak declarations would silently produce a NULL call instead of a link error if they were ever missing. Use plain extern declarations so the linker catches absent symbols at build time. Also remove the now-disproved claim that dlsym cannot work from the entry point — the actual issue was stack misalignment, not missing glibc init. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p) signature Per ELF spec / glibc ldsodefs.h dl_init_t, .init_array constructors receive (int argc, char **argv, char **envp). Previously they were called as void(*)(void), which works in practice on SysV ABIs (extra args in registers, callee ignores them) but is technically UB. Pass the real values; envp is derived from the initial stack layout (argv + argc + 1) before argc/argv are adjusted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Some clang configurations (cc crate passes --target=x86_64-unknown-linux-gnu which changes header search paths) cannot find system elf.h, causing build failures in CI (11 compile errors: ElfW undeclared, _DYNAMIC undeclared, etc.). Replace ElfW(Dyn) / DT_* with a self-contained SsiDyn struct and inline #define constants. ssi_entry.c targets Linux LP64 only (x86-64 + aarch64), so intptr_t/uintptr_t match the 64-bit ELF Dyn layout exactly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0bd0d33 to
83f437f
Compare
libdd-libunwind-sys/build.rs uses #[path = "buildscript/linux.rs"] to include
platform-specific build logic, but the RUST_FILES find filter in the Makefile
only matched */src*, */build.rs, etc. — not */buildscript*.
The generated pecl .tgz therefore lacked buildscript/{linux,macos,windows}.rs,
causing "couldn't read libdd-libunwind-sys/buildscript/linux.rs: No such file"
when pecl install tried to compile the Rust code.
Add -path "*/buildscript*" to the find filter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
83f437f to
eece5c4
Compare
Introduce the ExecSolib spawn strategy by embedding an ELF entry point (_dd_solib_start) into ddtrace.so itself.
ddtrace.so becomes pie executable and runs without the dynamic linker. After self-relocation:
tested on glibc and musl on linux aarch64
Description
Reviewer checklist